AITopics | cascading bandit

Cascading Bandits: Optimizing Recommendation Frequency in Delayed Feedback Environments

Neural Information Processing SystemsDec-27-2025, 06:19:10 GMT

Delayed feedback is a critical problem in dynamic recommender systems. In practice, the feedback result often depends on the frequency of recommendation. Most existing online learning literature fails to consider optimization of the recommendation frequency, and regards the reward from each successfully recommended message to be equal. In this paper, we consider a novel cascading bandits setting, where individual messages from a selected list are sent to a user periodically. Whenever a user does not like a message, she may abandon the system with a probability positively correlated with the recommendation frequency.

cascading bandit, delayed feedback environment, optimizing recommendation frequency, (4 more...)

Neural Information Processing Systems

Industry: Education (0.60)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.40)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.40)

Add feedback

Minimax Regret for Cascading Bandits

Neural Information Processing SystemsDec-25-2025, 03:11:58 GMT

Cascading bandits is a natural and popular model that frames the task of learning to rank from Bernoulli click feedback in a bandit setting. For the case of unstructured rewards, we prove matching upper and lower bounds for the problem-independent (i.e., gap-free) regret, both of which strictly improve the best known. A key observation is that the hard instances of this problem are those with small mean rewards, i.e., the small click-through rates that are most relevant in practice. Based on this, and the fact that small mean implies small variance for Bernoullis, our key technical result shows that variance-aware confidence sets derived from the Bernstein and Chernoff bounds lead to optimal algorithms (up to log terms), whereas Hoeffding-based algorithms suffer order-wise suboptimal regret. This sharply contrasts with the standard (non-cascading) bandit setting, where the variance-aware algorithms only improve constants. In light of this and as an additional contribution, we propose a variance-aware algorithm for the structured case of linear rewards and show its regret strictly improves the state-of-the-art.

cascading bandit, minimax regret, name change, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Cascading Bandits: Optimizing Recommendation Frequency in Delayed Feedback Environments

Neural Information Processing SystemsJan-20-2025, 02:58:27 GMT

Delayed feedback is a critical problem in dynamic recommender systems. In practice, the feedback result often depends on the frequency of recommendation. Most existing online learning literature fails to consider optimization of the recommendation frequency, and regards the reward from each successfully recommended message to be equal. In this paper, we consider a novel cascading bandits setting, where individual messages from a selected list are sent to a user periodically. Whenever a user does not like a message, she may abandon the system with a probability positively correlated with the recommendation frequency.

cascading bandit, delayed feedback environment, optimizing recommendation frequency, (2 more...)

Neural Information Processing Systems

Industry: Education (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.43)

Add feedback

Minimax Regret for Cascading Bandits

Neural Information Processing SystemsJan-18-2025, 17:38:33 GMT

Cascading bandits is a natural and popular model that frames the task of learning to rank from Bernoulli click feedback in a bandit setting. For the case of unstructured rewards, we prove matching upper and lower bounds for the problem-independent (i.e., gap-free) regret, both of which strictly improve the best known. A key observation is that the hard instances of this problem are those with small mean rewards, i.e., the small click-through rates that are most relevant in practice. Based on this, and the fact that small mean implies small variance for Bernoullis, our key technical result shows that variance-aware confidence sets derived from the Bernstein and Chernoff bounds lead to optimal algorithms (up to log terms), whereas Hoeffding-based algorithms suffer order-wise suboptimal regret. This sharply contrasts with the standard (non-cascading) bandit setting, where the variance-aware algorithms only improve constants.

algorithm, cascading bandit, minimax regret

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.43)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.40)

Add feedback

Best Arm Identification for Cascading Bandits in the Fixed Confidence Setting

Zhong, Zixin, Cheung, Wang Chi, Tan, Vincent Y. F.

arXiv.org Machine LearningJan-23-2020

We design and analyze CascadeBAI, an algorithm for finding the best set of $K$ items, also called an arm, within the framework of cascading bandits. An upper bound on the time complexity of CascadeBAI is derived by overcoming a crucial analytical challenge, namely, that of probabilistically estimating the amount of available feedback at each step. To do so, we define a new class of random variables (r.v.'s) which we term as left-sided sub-Gaussian r.v.'s; these are r.v.'s whose cumulant generating functions (CGFs) can be bounded by a quadratic only for non-positive arguments of the CGFs. This enables the application of a sufficiently tight Bernstein-type concentration inequality. We show, through the derivation of a lower bound on the time complexity, that the performance of CascadeBAI is optimal in some practical regimes. Finally, extensive numerical simulations corroborate the efficacy of CascadeBAI as well as the tightness of our upper bound on its time complexity.

best arm identification, lemma 5, probability, (11 more...)

arXiv.org Machine Learning

2001.08655

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Cascading Bandits: Learning to Rank in the Cascade Model

Kveton, Branislav, Szepesvari, Csaba, Wen, Zheng, Ashkan, Azin

arXiv.org Machine LearningMay-18-2015

A search engine usually outputs a list of $K$ web pages. The user examines this list, from the first web page to the last, and chooses the first attractive page. This model of user behavior is known as the cascade model. In this paper, we propose cascading bandits, a learning variant of the cascade model where the objective is to identify $K$ most attractive items. We formulate our problem as a stochastic combinatorial partial monitoring problem. We propose two algorithms for solving it, CascadeUCB1 and CascadeKL-UCB. We also prove gap-dependent upper bounds on the regret of these algorithms and derive a lower bound on the regret in cascading bandits. The lower bound matches the upper bound of CascadeKL-UCB up to a logarithmic factor. We experiment with our algorithms on several problems. The algorithms perform surprisingly well even when our modeling assumptions are violated.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

1502.02763

Country: North America > United States > California > Santa Clara County (0.28)

Genre: Research Report (0.64)

Technology: